147 research outputs found

    Belief propagation generative adversarial networks

    Get PDF
    Generative adversarial networks (GANs) are a class of generative models based on a minimax game. They have led to significant improvement in the field of unsupervised learning, especially image generation. However, most works in GANs are based on learning the distribution of the input dataset through a multi-layer neural network which does not explicitly model the structure of the input variables. This may work well in large and less noisy datasets, with the expectation that the learning procedure is able to assign relatively small weights to the occasional noise through averaging of many inputs. However, this approach potentially suffers when the input size is limited or noisy, resulting in reduced quality of generated samples by picking up spurious structures. In this thesis we propose a technique to model the structure of the variable interactions by incorporating graphical models in the generative adversarial network. The proposed framework produces samples by passing random inputs through a neural network to construct the local potentials in the graphical model; performing probabilistic inference in this graphical model then yields the marginal distribution. Message passing based on discrete variables keeps a table of local potential values, the size of which could be too big for natural images. We present a solution based on continuous variables with unary and pairwise Gaussian potentials, and perform probabilistic inference using loopy belief propagation on continuous Markov random fields. Experiments on the MNIST dataset show that our model is able to outperform vanilla GANs with more than two iterations of belief propagation

    Learning Only On Boundaries: a Physics-Informed Neural operator for Solving Parametric Partial Differential Equations in Complex Geometries

    Full text link
    Recently deep learning surrogates and neural operators have shown promise in solving partial differential equations (PDEs). However, they often require a large amount of training data and are limited to bounded domains. In this work, we present a novel physics-informed neural operator method to solve parametrized boundary value problems without labeled data. By reformulating the PDEs into boundary integral equations (BIEs), we can train the operator network solely on the boundary of the domain. This approach reduces the number of required sample points from O(Nd)O(N^d) to O(Nd−1)O(N^{d-1}), where dd is the domain's dimension, leading to a significant acceleration of the training process. Additionally, our method can handle unbounded problems, which are unattainable for existing physics-informed neural networks (PINNs) and neural operators. Our numerical experiments show the effectiveness of parametrized complex geometries and unbounded problems

    An Expert's Guide to Training Physics-informed Neural Networks

    Full text link
    Physics-informed neural networks (PINNs) have been popularized as a deep learning framework that can seamlessly synthesize observational data and partial differential equation (PDE) constraints. Their practical effectiveness however can be hampered by training pathologies, but also oftentimes by poor choices made by users who lack deep learning expertise. In this paper we present a series of best practices that can significantly improve the training efficiency and overall accuracy of PINNs. We also put forth a series of challenging benchmark problems that highlight some of the most prominent difficulties in training PINNs, and present comprehensive and fully reproducible ablation studies that demonstrate how different architecture choices and training strategies affect the test accuracy of the resulting models. We show that the methods and guiding principles put forth in this study lead to state-of-the-art results and provide strong baselines that future studies should use for comparison purposes. To this end, we also release a highly optimized library in JAX that can be used to reproduce all results reported in this paper, enable future research studies, as well as facilitate easy adaptation to new use-case scenarios.Comment: 36 pages, 25 figures, 13 table

    Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

    Full text link
    Vision transformers have achieved significant improvements on various vision tasks but their quadratic interactions between tokens significantly reduce computational efficiency. Many pruning methods have been proposed to remove redundant tokens for efficient vision transformers recently. However, existing studies mainly focus on the token importance to preserve local attentive tokens but completely ignore the global token diversity. In this paper, we emphasize the cruciality of diverse global semantics and propose an efficient token decoupling and merging method that can jointly consider the token importance and diversity for token pruning. According to the class token attention, we decouple the attentive and inattentive tokens. In addition to preserving the most discriminative local tokens, we merge similar inattentive tokens and match homogeneous attentive tokens to maximize the token diversity. Despite its simplicity, our method obtains a promising trade-off between model complexity and classification accuracy. On DeiT-S, our method reduces the FLOPs by 35% with only a 0.2% accuracy drop. Notably, benefiting from maintaining the token diversity, our method can even improve the accuracy of DeiT-T by 0.1% after reducing its FLOPs by 40%

    Bilateral-Fuser: A Novel Multi-cue Fusion Architecture with Anatomical-aware Tokens for Fovea Localization

    Full text link
    Accurate localization of fovea is one of the primary steps in analyzing retinal diseases since it helps prevent irreversible vision loss. Although current deep learning-based methods achieve better performance than traditional methods, there still remain challenges such as utilizing anatomical landmarks insufficiently, sensitivity to diseased retinal images and various image conditions. In this paper, we propose a novel transformer-based architecture (Bilateral-Fuser) for multi-cue fusion. This architecture explicitly incorporates long-range connections and global features using retina and vessel distributions for robust fovea localization. We introduce a spatial attention mechanism in the dual-stream encoder for extracting and fusing self-learned anatomical information. This design focuses more on features distributed along blood vessels and significantly decreases computational costs by reducing token numbers. Our comprehensive experiments show that the proposed architecture achieves state-of-the-art performance on two public and one large-scale private datasets. We also present that the Bilateral-Fuser is more robust on both normal and diseased retina images and has better generalization capacity in cross-dataset experiments.Comment: This paper is prepared for IEEE TRANSACTIONS ON MEDICAL IMAGIN
    • …
    corecore